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Abstract. Given a word w and a Parikh vector V, an abelian run of 
period V in w is a maximal occurrence of a substring of w having abelian 
period V. We give an algorithm that finds all the abelian runs of period 
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1 Introduction 

Computing maximal (non-extendable) repetitions in a string is a classical topic 
in the area of string algorithms (see for example |Tj and references therein). 
Detecting maximal repetitions of substrings, also called runs, gives information 
on the repetitive regions of a string, and is used in many applications, for example 
in the analysis of genomic sequences. 

Kolpakov and Kucherov [5] gave a linear time algorithm for computing all the 
runs in a word and conjectured that any word of length n contains less than n 
runs. Bannai et al. [Tj recently proved this conjecture using the notion of Lyndon 
root of a run. 

Here we deal with a generalization of this problem to the commutative set¬ 
ting. Recall that an abelian power is a concatenation of two or more words that 
have the same Parikh vector, i.e., that have the same number of occurrences 
of each letter of the alphabet. For example, aababa is an abelian square, since 
aab and aba both have 2 a’s and 1 b. When an abelian power occurs within 
a string, one can search for its “maximal” occurrence by extending it to the 
left and to the right character by character without violating the condition on 
the number of occurrences of each letter. Following the approach of Constan- 
tinescu and Hie |2], we say that a Parikh vector V is an abelian period for a 
word w over a finite ordered alphabet £ = {ai, a?, ■ ■ ■, a a } if w can be written 
as w = uqU\ ■ • •Uk-iUk for some k > 2 where for 0 < i < k all the Ui s have 
the same Parikh vector V and the Parikh vectors of u o and Uk are contained 
in V . Note that the factorization above is not necessarily unique. For example, 
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a ■ bba ■ bba ■ e and e • abb ■ abb ■ a (e denotes the empty word) are two factorizations 
of the word abbabba both corresponding to the abelian period (1,2). Moreover, 
the same word can have different abelian periods. 

In this paper we define an abelian run of period V in a word w as an occur¬ 
rence of a substring v of w such that v has abelian period V and this occurrence 
cannot be extended to the left nor to the right by one letter into a substring 
having the same abelian period V. 

For example, let w = ababaaa. Then the prefix ab-ab-a = w[l.. 5] has abelian 
period (1,1) but it is not an abelian run since the prefix a • ba ■ ba ■ a = w[ 1.. 6] 
has also abelian period (1,1). This latter, instead, is an abelian run of period 
(1,1) in w. 

Looking for abelian runs in a string can be useful to detect those regions 
in a string in which there is some kind of non-exact repetitiveness, for example 
regions in which there are several consecutive occurrences of a substring or its 
reverse. 

Matsuda et al. [0 recently presented an offline algorithm for computing all 
abelian runs of a word of length n in 0{rr) time. Notice that, however, the 
definition of abelian run in |B] is slightly different from the one we consider here. 
We will comment on this in Section [3] 

We present an online algorithm that, given a word w of length n over an 
alphabet of cardinality a , and a Parikh vector V, returns all the abelian runs of 
period V in w in time 0(n x \V\) and space 0(a + \V\). 


2 Definitions and Notation 

Let E = (ai, 02 , ..., a a } be a finite ordered alphabet of cardinality cr and let E* 
be the set of finite words over E. We let |u>| denote the length of the word w. 
Given a word w = ui[0.. n — 1] of length n > 0, we write u>[z] for the ( i + l)-th 
symbol of w and, for 0 ^ i ^ j < n, we write w[i.. j] for the substring of w from 
the (z + l)-th symbol to the (j + l)-th symbol, both included. We let |w| a denote 
the number of occurrences of the symbol a £ if in the word w. 

The Parikh vector of w, denoted by Vw , counts the occurrences of each letter 
of E in w, that is, Vw = (IHan ■ • •, M^)- Notice that two words have the same 
Parikh vector if and only if one word is a permutation (i.e., an anagram) of the 
other. 

Given the Parikh vector Vw of a word w , we let Vw[ *] denote its z-th compo¬ 
nent and \Vw \ its norm, defined as the sum of its components. Thus, for w £ E* 
and 1 < i ^ cr, we have V w [i\ = M ai and \P W \ = J2i= l = M- 

Finally, given two Parikh vectors V,Q, we write V C Q if V[i] ^ Q[i\ for 
every 1 < i < a and l^l < \Q\. 

Definition 1 (Abelian period j2|). A Parikh vector V is an abelian period 
for a word w if w = uqU\ ■ ■ • Uk-iUk, for some k > 2, where Vu 0 C Vu x = • ■ • = 
'Puk -1 Vu k , and Vui = V. 
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Note that since the Parikh vector of uq and Uk cannot be included in V it 
implies that |ito|, \uk\ < \V\. We call uq and Uk respectively the head and the 
tail of the abelian period. Note that in j2] the abelian period is characterized by 
|ito| and \V\ thus we will sometimes use the notation ( h,p ) for an abelian period 
of norm p and head length h of a word w. Notice that the length t of the tail is 
uniquely determined by h , p and n = |u>|, namely t = (n — h) mod p. 

Definition 2 (Abelian repetition). A substring w[i.. j] is an abelian repeti¬ 
tion with period length p if i — j + 1 is a multiple of p, i — j + 1 > 2p and there 
exists a Parikh vector V of norm p such that V w \ [ i + kp..i+(k+i)p-i] — 'P f or every 
0 < k < p/(i — j + 1). 

An abelian repetition w[i.. j ] with period length p such that i — j + 1 = 2p 
is called an abelian square. An abelian repetition u>[i. .j] of period length p of a 
string w is maximal if: 

1- Pw[ip..il] 7^ Pw[i..i+pl\ or W < 0; 

2- Pw[jp+i..j] / P > w[j+i..j+p] or j +p > n. 

We now give the definition of an abelian run. Let v = w[b.. e], 0 < b < e < 
|iy| — 1, be an occurrence of a substring in w and suppose that v has an abelian 
period "P, with head length h and tail length t. Then we denote this occurrence 
by the tuple (b, h , t, e). 

Definition 3. Let w be a word. An occurrence (b,h,t,e) of a substring of w 
starting at position b, ending at position e, and having abelian period V with head 
length h and tail length t is called left-maximal (resp. right maximal) if there 
does not exist an occurrence of a substring (b — 1 ,h',t',e) (resp. (b,h',t',e + 1)) 
with the same abelian period V . An occurrence (6, h,t , e) is called maximal if it 
is both left-maximal and right-maximal. 

This definition leads to the one of abelian run. 

Definition 4. An abelian run is a maximal occurrence (6, h, t , e) of a substring 
with abelian period V of norm p such that (e — b — h — t + 1) > 2p (see Fig. [7]j. 


e — b — h — t+l 



h V V V t 

Fig. 1 . The tuple (b , h, t, e) denotes an occurrence of a substring starting at position b, 
ending at position e, and having abelian period V with head length h and tail length 
t. 
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The next result limits the number of abelian runs starting at each position 
in a word. 

Lemma 5. Let w be a word. Given a Parikh vector V, there is at most one 
abelian run with abelian period V starting at each position of w. 

Proof. If two abelian runs start at the same position, the one with the shortest 
head cannot be maximal. □ 

Corollary 6. Let w be a word. Given a Parikh vector V, for every position i in 
w there are at most [P\ abelian runs with period V overlapping at i. 

The next lemma shows that a left-maximal abelian substring at the right of 
another left-maximal abelian substring starting at position i in a word w cannot 
begin at a position smaller than i. 

Lemma 7. If (&i, hi, 0, e\) and (b 2l h 2 ,0, e 2 ) are two left-maximal occurrences 
of substrings with the same abelian period V of a word v such that e\ < e 2 and 
b\ > ei — 2 x \P\ + 1 and b 2 > e 2 — 2 x \V\ + 1, then b\ < b 2 . 

Proof. If b 2 < bi then since e 2 > e\, w\b\. .b\ + hi — 1] is a substring of 
w[b 2 ..b 2 + h 2 - 1]. Thus P u >[b 1 ..b 1+ /i 1 _i] C V W [ b2 „ b2+h2 ^i] C V which implies 
that 7 :> ui[b 1 _i..f )1 +ft, 1 -i] C V meaning that (&i, hi, 0, ei) is not left-maximal: a 
contradiction. □ 

We recall the following proposition, which shows that if we can extend the 
abelian period with the longest tail of a word w when adding a symbol a, then 
we can extend all the other abelian periods with shorter tail. 

Proposition 8 ([!]). Suppose that a word w has s abelian periods (h±,pi) < 
( h 2 ,p 2 ) < • • • < ( h Sl p s ) such that (|xt;| — hi) mod pi = t > 0 for every 1 < i < s. 
If for a letter a £ S, (h\,p\) is an abelian period ofwa, then ( h 2l p 2 ),..., ( h Sl p s ) 
are also abelian periods of wa. 

We want to give an algorithm that, given a string w and a Parikh vector V, 
returns all the abelian runs of w having abelian period V. 

3 Previous Work 

In [6], the authors presented an algorithm that computes all the abelian runs 
of a string w of length n in 0(n 2 ) time and space complexity. They consider 
that a substring w[i — h.. j + t\ is an abelian run if w[i.. j] is a maximal abelian 
repetition with period length p and h,t > 0 are the largest integers satisfying 
V w [i- h ..i- 1 ] C V w [i..i+ P - 1 ] and V w \j +1 .. j+t ] C V w [i.. i+p -i]. Their algorithm works 
as follows. First, it computes all the abelian squares using the algorithm of [3]. 
For each 0 < i < n — 1, it computes a set Li of integers such that 


Li = {j | V w [i-j..i] = V w [ i+ i.. i+ j+i],0 < j < min{* + 1, n — i}}. 
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The Li s are stored in a two-dimensional boolean array L of size \n/2\ x (n — 1): 
L[j. i] = 1 if j £ Li and L\j, i] = 0 otherwise. An example of array L is given in 
Figure [2] All entries in L are initially unmarked. Then, for each 1 < j < |_n/2j x 
all maximal abelian repetitions of period length j are computed in 0(n). The 
j-th row of L is scanned in increasing order of the column index. When an 
unmarked entry L\J,i] = 1 is found then the largest non-negative integer k such 
that L\j, i + pj + 1] = 1, for 1 < p < k, is computed. This gives a maximal 
abelian repetition with period length j starting at position i — j + 1 and ending 
at position i + {k + 1 )j. Meanwhile all entries L[j , i + pj + 1], for —1 < p < k, 
are marked. Thus all abelian repetitions are computed in 0(n 2 ) time. It remains 
to compute the length of their heads and tails. This cannot be done naively 
otherwise it would lead to a 0(n 3 ) time complexity overall. Instead, for each 
0 < i < n — 1, let Ti be the set of positive integers such that for each j £ Ti there 
exists a maximal abelian repetition of period j and starting at position i — j + 1. 
Elements of T,; are processed in increasing order. Let jk denote the fc-th smallest 
element of Ti. Let hk denote the length of the head of the abelian run computed 
from the abelian repetition of period jk- Then hk can be computed from hk- i, 
jk- i and jk as follows. Two cases can arise: 

1. If k = 0 or jk -1 + hk -1 < jk , then hk can be computed by comparing the 
Parikh vector V w [i-j k - p ..i-j k ] for increasing values of V from 0 up to hk +1, 
with the Parikh vector V w u-j k+ in. 

2. If j k -1 + h k -1 > jk , then 

'P w [i-j k - 1 -h k ..i-j k \ can be computed from P w [ i - jk _ 1 - hk _ 1+ i.A-ju--,}- Then, 
hk is computed by comparing the Parikh vector V w [i-j k _ 1 -h k _ 1 +i- p ..i-j k ] 
for increasing values of p from 0 up to hk + jk — hk -i — jk -1 + 1. 

This can be done in 0[n ) time. The lengths of the tails can be computed simi¬ 
larly. Overall, all the runs can be computed in time and space 0(n 2 ). 
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Fig. 2. An example of array L for w = abaababaabbb. = 1 which means that 
Pui[3..6] = Pm[7.,10]- 


This previous method works offline: it needs to know the whole string before 
reporting any abelian run. We will now give what we call an online method 
meaning that we will be able to report the abelian runs ending at position i — 1 
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of a string w when processing position i. However, this method is restricted to 
a given Parikli vector. 


4 A Method for Computing Abelian Runs of a Word 
with a Given Parikh Vector 

4.1 Algorithm 

Positions of w are processed in increasing order. Assume that when processing 
position i we know all the, at most \V\, abelian substrings ending at position i—1. 
At each position i we checked if V w u_yp\ + \.n = V then all abelian substrings 
ending at position i — 1 can be extended and thus become abelian substrings 
ending at position i. Otherwise, if 4 > UI [i-|-p|+i..i] 7 ^ V then abelian substrings 
ending at positions i — 1 are processed in decreasing order of tail length. When an 
abelian substring cannot be extended it is considered as an abelian run candidate. 
As soon as an abelian substring ending at position i — 1 can be extended then all 
the others (with smaller tail length) can be extended: they all become abelian 
substrings ending at position i. At most one candidate (with the smallest starting 
position) can be output at each position. 


4.2 Implementation 

The algorithm Runs^, w) given below computes all the abelian runs with 
Parikh vector V in the word w. It uses: 

— function Find("P, w), which returns the ending position of the first occur¬ 
rence of Parikh vector V in w or |w| + 1 if such an occurrence does not 
exist; 

— function FindHead(u>, i, V), which returns the leftmost position j < i such 
that V w y.,i- 1 ] C P or i is such a substring does not exist; 

— function Min(U) that returns the smallest element of the integer array B. 

Positions of w are processed in increasing order iLines I41 I2T1) . We will now 
describe the situation when processing position i of w: 

— array B stores the starting positions of abelian substrings ending at position 
i — 1 for the different 14*1 tail lengths (B is considered as a circular array); 

— to is the index in B of the possible abelian substring with a tail of length 0 
ending at position i. 

All the values of the array B are initially set to |w|. Then, when processing 
position i of w, for 0 < k < \V\ and k ± to, if B[k] = b < |«j| then w[b. .i—1] is an 
abelian substring with Parikh vector V with tail length ((to — k+ \P\) mod \V\) — 
1. Otherwise, if B[k] = |u| then it means that there is no abelian substring in w 
ending at position i—1 with tail length ((to — k + \V\) mod \V\) — 1. 

The algorithm Runs(7 :> ,'u;) uses two other functions: 
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— function GETTAiL(ta*Z, to,p), which returns (to — tail + p) modp which is 
the length of the tail for the abelian substring ending at position i — 1 and 
starting in B[tail ]; 

— function GetRun(R, tail, to, e,p), which returns the abelian substring 
(B[tail],h,t,e). 

If T- > w[i-\v\+i..i\ = T* (Line [6]) then all abelian substrings ending at position 
i — 1 can be extended (see Fig. 0. Either this occurrence does not extend a 
previous occurrence at position i — \V\ (Line [7]): the starting position has to be 
stored in B (Line [5]) or this occurrence extends a previous occurrence at position 
i — \P\ and the starting position is already stored in the array B. 


\V\ 


[P\ 


\v\ 


Fig. 3. If = V then V w y..i] C V for i - \V\ + 1 < j < i. 


If 'Pw[i—\'P\+i..i] 7^ 'P ILines RTH2TI) then abelian substrings ending at position 
i — 1 are processed in decreasing order of tail length. To do that, the circular 
array B is processed in increasing order of index starting from to (Lines I11I119D . 

Let tail be the current index in array B. At first, tail is set to to ILinc fT0|) . 
In this case there is no need to check if there is an abelian substring with tail 
length 0 ending at position i (since it has been detected in Line 0 and thus 
(B[to\,h, \P\ — l,i — 1) is considered as an abelian substring candidate fLinefl5l) 
and array B is updated (Linc flGl) since (B[to\, h, 0, i) is not an abelian substring. 

When tail ^ t 0 , let t = getTail {tail, to, \V\). If / P w [i-t+i..i] *t- P an( I H 1118 
(B[tail\,h, t,i — 1) is considered as an abelian substring candidate (Line |T5]) and 
array B is updated ILine flUl) since (B[tail],h,t + 1, i) is not an abelian substring. 
If P w [i-t+i..i\ C V then, for tail < k < (to — 1 + \P\) mod \P\, 3 h' k ,t' k such that 
(B[k\,h' k ,t' k ,i) is an abelian substring. It comes directly from Prop. [8] 

At each iteration of the loop in Lines [TTIfTTTI b is either equal to |tu| or to 
the position of the leftmost abelian run ending at position i — 1. Thus a new 
candidate is found if its starting position is smaller than b ('Lines UdlfTKll . It comes 
directly from Lemma 0 


Example 

Let us see the behaviour of the algorithm on S = {a, b}, w = abaababaabbb 
and V = (2,2): 
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Algorithm 1: GETTAiL(tai/, to,p) 
l return (to — tail + p) mod p 


Algorithm 2: GetRun(£>, tail, to, e,p) 

1 b <— B[tail] 

2 if tail = to then 

3 t <r- p — 1 

4 t -f— GETTAiL(tait, to,p) — 1 

5 h . (e — t — b + 1) mod p 

6 return (6, h, t, e) 


Algorithm 3: Runs(7 :> ,w) 

1 jf- Find(P, w) 

2 (B,to) (|w| |p| ,0) 

3 B[t 0 ] <- FindHead(u), j - |R| + 1,R) 

4 for i <— j + 1 to |w| do 

5 to (to + 1) mod |R| 

6 if i < |w| and V w [i-yp\ +i..i\ = V then 

7 if B[to\ = |to| then 

8 R[t 0 ] •<— FindHead(w, i — \P\ + 1, V) 

9 else 

10 (b, tail) <— (|w|, to) 

11 repeat 

12 if B[tail] ^ |to| then 

13 if tail = to or i = |w| or V w { i - G *rfrAu,(tail,t 0 ,\v\)+i..i] t R then 

14 if B[tail\ ^ b then 

15 (b, h, t, e ) «— GetRun(_B, tail , to, i — 1, |R|) 

16 _B[taiZ] |io| 

17 else break 

18 tail <— (tail + 1) mod |"P| 

19 until tail = to 

20 if MIn(_B) > b and e — t — h — b + 1> \P\ then 

2 1 Output (b,h,t,e) 
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i = 4, .B = (12,12,12,12), = 0 

5[0] = 0 ,5 = (0,12,12,12) 
i = 5 

<o = 1 
Vw[ 2..5] ^ ^ 

(6, <ai<) = (12,1) 

tail = 3, 2,1,0 

i = 6 

to = 2 
^w[3. .6] = T 3 

5[2] =0,5 = (0,12,0,12) 

i = 7 

<o = 3 
^uj[4..7] = ^ 

5[3] = 1,5 = (0,12,0,1) 

i = 8 

<o = 0 
^■u>[5..8] 7^ ^ 

(6, tail) = (12,0) 

(b,h,t,e) = (0,1,3,7) 

5[0] = 12,5 = (12,12,0,1) 
tail = 1 
tail = 2 

i = 9 

<o = l 

'Pw[e..9] = T 3 

5[1] =3,5 = (12,3,0,1) 

* = 10 

<o = 2 

"Pw[7..10] = 'P 

5 [2] ^ 12 

i = 11 

<o = 3 

?V.ll] 7^ T 3 

(6, <ai<) = (12,3) 

(6, h, <, e) = (1,3,3,10) 

5 [3] = 12,5 = (12,3,0,12) 
tail = 0 
<a« = 1 

i = 12 

<o = 0 

i > 12 

(&, tail ) = (12,0) 

<a« = 1 

(&,M,e) = (3,3,2,11) 
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B[ 1] =12 ,B = (12,12,0,12) 
tail = 2 

( b,h,t,e ) = (0,3,1,11) 

B[ 1] =12 ,B = (12,12,12,12) 
tail = 3 
tail = 0 

Output((0, 3,1,11) 

4.3 Correctness and Complexity 

Theorem 9. The algorithm RUN("P,ii;) computes all the abelian runs with Pa- 
rikh vector V in a string w of length n in time 0(n x |"P|) and additional space 

0(a+ \V\). 

Proof. The correctness of the algorithm comes from Corollary [G] Lemma 0 and 
Prop. [8] The loop in lines ITIim iterates at most n times. The loop in lines ITTf 
riOl iterates at most \V\ times. The instructions in lines [Gj [8] and [13] regarding 
the comparison of Parikh vectors can be performed in 0{n) time overall, inde¬ 
pendently from the alphabet size, by maintaining the Parikh vector of a sliding 
window of length [P\ on w and a counter r of the number of differences be¬ 
tween this Parikh vector and V. At each sliding step, from w[i — \V\.. i — 1] to 
w[i — \V\ + 1.. i] the counters of the characters w[i — \V\] and w[i] are updated, 
compared to their counterpart in V and r is updated accordingly. The additional 
space comes from the Parikh vector and from the array B, which has \P\ ele¬ 
ments. □ 

5 Conclusions 

We gave an algorithm that, given a word w of length n and a Parikh vector 
V , returns all the abelian runs of period V in w in time 0(n x [P\) and space 
O{o + 17^1). The algorithm works in an online manner. To the best of our 
knowledge, this is the first algorithm solving the problem of searching for all the 
abelian runs having a given period. 

We believe that further combinatorial results on the structure of the abelian 
runs in a word could lead to new algorithms. 

One of the reviewers of this submission pointed out that our algorithm can 
be modified in order to achieve time complexity O(n). Due to the limited time 
we had for preparing the final version of this paper, we did not include such 
improvement here. We will provide the details in a forthcoming full version of 
the paper. By the way, we warmly thank the reviewer for his comments. 
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